Recurrent Scene Parsing with Perspective Understanding in the Loop
نویسندگان
چکیده
Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process an image at a fixed resolution. We propose a depth-aware gating module that adaptively chooses the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details can be preserved for objects at distance and a larger receptive field can be used for objects nearer to the camera. The depth gating signal is provided from stereo disparity (when available) or estimated directly from a single image. We integrate this depth-aware gating into a recurrent convolutional neural network trained in an end-to-end fashion to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth estimate and output prediction from the previous loop. Through extensive experiments on three popular large-scale RGB-D datasets, we demonstrate our approach achieves competitive semantic segmentation performance using more compact model than existing methods. Interestingly, we find segmentation performance improves when we estimate depth directly from the image rather than using “ground-truth” and the model produces state-of-the-art results for quantitative depth estimation from a single image.
منابع مشابه
Hierarchical Feature For Scene Parsing Using Fully Recurrent Network
In scene parsing, the wide-range contextual information is not effectively encoded. Scene parsing provides segmentation and determines an scene into different regions associated with semantic categories. The main objective of scene parsing is to reduce semantic gap between humans and computer machines on scene understanding. The scenes parsing applications are object detection, text detection o...
متن کاملMulti-Path Feedback Recurrent Neural Networks for Scene Parsing
In this paper, we consider the scene parsing problem and propose a novel MultiPath Feedback recurrent neural network (MPF-RNN) for parsing scene images. MPF-RNN can enhance the capability of RNNs in modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse. Different from feedforward CNNs and RNNs with only single feedback, MPFRNN propagat...
متن کاملMulti-Path Feedback Recurrent Neural Network for Scene Parsing
In this paper, we consider the scene parsing problem. We propose a novel Multi-Path Feedback recurrent neural network (MPF-RNN) to enhance the capability of RNNs on modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse in pixel-wise classification. In contrast to CNNs without feedback and RNNs with only a single feedback path, MPFRNN p...
متن کاملGeometric Scene Parsing with Hierarchical LSTM
This paper addresses the problem of geometric scene parsing, i.e. simultaneously labeling geometric surfaces (e.g. sky, ground and vertical plane) and determining the interaction relations (e.g. layering, supporting, siding and affinity) between main regions. This problem is more challenging than the traditional semantic scene labeling, as recovering geometric structures necessarily requires th...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.07238 شماره
صفحات -
تاریخ انتشار 2017